Posting Paper on the Web

نویسنده

  • William A. Barrett
چکیده

We present a document processing system that accepts scanned images of paper documents as input and outputs hyperlinked electronic documents. The system segments document images, separating text from graphics, recognizes text, and creates hypertext links between document components (text, images, graphics). By (1) limiting input to popular Times-Roman and Helvetica fonts found in first-generation scans of columnated magazines and tabloids, and using (2) gray scale attributes, (3) multiple character prototypes to recognize kerned and touching characters, (4) a lexicon to find and correct recognition errors, and (5) providing user interaction to recognize problem words, we achieve OCR accuracies up to 99.8%. This compares closely to our measurements of human proofreading accuracy (99.94%) which, however, takes six times longer. A simple method for automated selection of important words in a document and creation of hypertext links from those words to other document components is developed to provide high-level searching and browsing of the document. Browsing granularity (i.e. the number and types of linked words) is user-selectable. The appropriateness of automatically selected link anchors is comparable to human performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The ethics of DeCSS posting: towards assessing the morality of the Internet posting of DVD copyright circumvention software

Introduction. We investigate the conditions under which posting software known as "DeCSS" on the Internet is ethical. DeCSS circumvents the access and copy control protection measures on commercial DVDs. Through our investigation, we point to limitations in current frameworks used to assess ethical computer based civil disobedience. Method. The paper draws on empirical findings of actual DeCSS ...

متن کامل

On Inverted Index Compression for Search Engine Efficiency

Efficient access to the inverted index data structure is a key aspect for a search engine to achieve fast response times to users’ queries. While the performance of an information retrieval (IR) system can be enhanced through the compression of its posting lists, there is little recent work in the literature that thoroughly compares and analyses the performance of modern integer compression sch...

متن کامل

A Selfie is Worth a Thousand Words: Mining Personal Patterns behind User Selfie-posting Behaviours

Selfies have become increasingly fashionable in the social media era. People are willing to share their selfies in various social media platforms such as Facebook, Instagram and Flicker. The popularity of selfie have caught researchers’ attention, especially psychologists. In computer vision and machine learning areas, little attention has been paid to this phenomenon as a valuable data source....

متن کامل

The More You Know: Information Effects on Job Application Rates by Gender in a Large Field Experiment∗

This paper presents the results from a 2.3 million person field experiment that varies whether a job seeker is shown the number of applicants for a job posting on a large job posting website, LinkedIn. This intervention increases the likelihood a person will start/finish an application by 0.6%-1.9%, representing an economically significant potential increase of over a thousand applications per ...

متن کامل

Five Years of Experience with a World-Wide-Web-based Job Directory for Neonatal-Perinatal Health Care

This poster describes the implementation and five-year utilization history for a free Web-based jobs directory used by health care professionals in neonatal-perinatal medicine. The World-Wide-Web site "Neonatology on the Web" (NOTW) has been in continuous operation since Fall of 1995. NOTW is dedicated to the information needs of professionals in neonatalperinatal medicine and is the most heavi...

متن کامل

Development of Real Time Synchronous Web Application for Posting and Utilizing Disaster Information

In a large earthquake, rescue operations and fire-fighting are obstructed by firespreading and street-blockages. Therefore, it is important to quickly collect and utilize disaster information for disaster mitigation. In this paper, firstly, we develop a Web application for posting and viewing information collected by users in real time. Using this system, it is possible not only to easily share...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007